Articles

< Previous         Next >  
Imputing single-cell RNA-seq data by considering cell heterogeneity and prior expression of dropouts 
Lihua Zhang1,2 , Shihua Zhang1,2,3,4,*
1NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
2School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
3Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
4Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
*Correspondence to:Shihua Zhang , Email:zsh@amss.ac.cn
J Mol Cell Biol, Volume 13, Issue 1, January 2021, 29-40,  https://doi.org/10.1093/jmcb/mjaa052
Keyword: single-cell RNA-seq, dropout, imputation, low-rank, systems biology

Single-cell RNA sequencing (scRNA-seq) provides a powerful tool to determine expression patterns of thousands of individual cells. However, the analysis of scRNA-seq data remains a computational challenge due to the high technical noise such as the presence of dropout events that lead to a large proportion of zeros for expressed genes. Taking into account the cell heterogeneity and the relationship between dropout rate and expected expression level, we present a cell sub-population based bounded low-rank (PBLR) method to impute the dropouts of scRNA-seq data. Through application to both simulated and real scRNA-seq datasets, PBLR is shown to be effective in recovering dropout events, and it can dramatically improve the low-dimensional representation and the recovery of gene‒gene relationships masked by dropout events compared to several state-of-the-art methods. Moreover, PBLR also detects accurate and robust cell sub-populations automatically, shedding light on its flexibility and generality for scRNA-seq data analysis.